NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards an open-source model for data and metadata standards

https://doi.org/10.31219/osf.io/br6u2

Rokem, Ariel; Mandava, Vani; Cristea, Nicoleta; Tambay, Anshul; Connolly, Andrew J (October 2024, Open Science Framework)

Progress in machine learning and artificial intelligence promises to advance research and understanding across a wide range of fields and activities. In tandem, increased awareness of the importance of open data for reproducibility and scientific transparency is making inroads in fields that have not traditionally produced large publicly available datasets. Data sharing requirements from publishers and funders, as well as from other stakeholders, have also created pressure to make datasets with research and/or public interest value available through digital repositories. However, to make the best use of existing data, and facilitate the creation of useful future datasets, robust, interoperable and usable standards need to evolve and adapt over time. The open-source development model provides significant potential benefits to the process of standard creation and adaptation. In particular, data and meta-data standards can use long-standing technical and socio-technical processes that have been key to managing the development of software, and which allow incorporating broad community input into the formulation of these standards. On the other hand, open-source models carry unique risks that need to be considered. This report surveys existing open-source standards development, addressing these benefits and risks. It outlines recommendations for standards developers, funders and other stakeholders on the path to robust, interoperable and usable open-source data and metadata standards.
more » « less
Full Text Available
An Efficient Shift-and-stack Algorithm Applied to Detection Catalogs

https://doi.org/10.3847/1538-3881/ae0e1a

Stetzler, Steven; Jurić, Mario; Bernardinelli, Pedro H; Bektešević, Dino; Chandler, Colin Orion; Connolly, Andrew J; Adams, Fred C; Fuentes, Cesar; Gerdes, David W; Holman, Matthew J; et al (November 2025, The Astronomical Journal)

Abstract The boundary of solar system object discovery lies in detecting its faintest members. However, their discovery in detection catalogs from imaging surveys is fundamentally limited by the practice of thresholding detections at signal-to-noise (SNR) ≥ 5 to maintain catalog purity. Faint moving objects can be recovered from survey images using the shift-and-stack algorithm, which coadds pixels from multi-epoch images along a candidate trajectory. Trajectories matching real objects accumulate signal coherently, enabling high-confidence detections of very faint moving objects. Applying shift-and-stack comes with high computational cost, which scales with target object velocity, typically limiting its use to searches for slow-moving objects in the outer solar system. This work introduces a modified shift-and-stack algorithm that trades sensitivity for speedup. Our algorithm stacks low-SNR detection catalogs instead of pixels, the sparsity of which enables approximations that reduce the number of stacks required. Our algorithm achieves real-world speedups of 10–10³× over image-based shift-and-stack while retaining the ability to find faint objects. We validate its performance by recovering synthetic inner and outer solar system objects injected into images from the DECam Ecliptic Exploration Project. Exploring the sensitivity–compute time trade-off of this algorithm, we find that our method achieves a speedup of ∼30× with 88% of the memory usage while sacrificing 0.25 mag in depth compared to image-based shift-and-stack. These speedups enable the broad application of shift-and-stack to large-scale imaging surveys and searches for faint inner solar system objects. We provide a reference implementation via thefind-asteroidsPython package and this URL:https://github.com/stevenstetzler/find-asteroids.
more » « less
Free, publicly-accessible full text available November 26, 2026
Software Engineering Practices in Academia: Promoting the 3Rs Readability, Resilience, and Reuse

https://doi.org/10.1162/99608f92.018bf012

Connolly, Andrew; Hellerstein, Joseph; Alterman, Naomi; Beck, David; Fatland, Rob; Lazowska, Ed; Mandava, Vani; Stone, Sarah (April 2023, Harvard data science review)

Full Text Available
From Data to Software to Science with the Rubin Observatory LSST

Breivik, Katelyn; Connolly, Andrew; Ford, Saavik; and, others (August 2022, https://arxiv.org/)

The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) dataset will dramatically alter our understanding of the Universe, from the origins of the Solar System to the nature of dark matter and dark energy. Much of this research will depend on the existence of robust, tested, and scalable algorithms, software, and services. Identifying and developing such tools ahead of time has the potential to significantly accelerate the delivery of early science from LSST. Developing these collaboratively, and making them broadly available, can enable more inclusive and equitable collaboration on LSST science.iv To facilitate such opportunities, a community workshop entitled “From Data to Software to Science with the Rubin Observatory LSST” was organized by the LSST Interdisciplinary Network for Collaboration and Computing (LINCC) and partners, and held at the Flatiron Institute in New York, March 28-30th 2022. The workshop included over 50 in-person attendees invited from over 300 applications. It identified seven key software areas of need: (i) scalable cross-matching and distributed joining of catalogs, (ii) robust photometric redshift determination, (iii) software for determination of selection functions, (iv) frameworks for scalable time-series analyses, (v) services for image access and reprocessing at scale, (vi) object image access (cutouts) and analysis at scale, and (vii) scalable job execution systems. This white paper summarizes the discussions of this workshop. It considers the motivating science use cases, identified cross-cutting algorithms, software, and services, their high-level technical specifications, and the principles of inclusive collaborations needed to develop them. We provide it as a useful roadmap of needs, as well as to spur action and collaboration between groups and individuals looking to develop reusable software for early LSST science.
more » « less
Full Text Available
The Astronomy Commons Platform: A Deployable Cloud-based Analysis Platform for Astronomy

https://doi.org/10.3847/1538-3881/ac77fb

Stetzler, Steven; Jurić, Mario; Boone, Kyle; Connolly, Andrew; Slater, Colin T.; Zečević, Petar (July 2022, The Astronomical Journal)

Abstract We present a scalable, cloud-based science platform solution designed to enable next-to-the-data analyses of terabyte-scale astronomical tabular data sets. The presented platform is built on Amazon Web Services (over Kubernetes and S3 abstraction layers), utilizes Apache Spark and the Astronomy eXtensions for Spark for parallel data analysis and manipulation, and provides the familiar JupyterHub web-accessible front end for user access. We outline the architecture of the analysis platform, provide implementation details and rationale for (and against) technology choices, verify scalability through strong and weak scaling tests, and demonstrate usability through an example science analysis of data from the Zwicky Transient Facility’s 1Bn+ light-curve catalog. Furthermore, we show how this system enables an end user to iteratively build analyses (in Python) that transparently scale processing with no need for end-user interaction. The system is designed to be deployable by astronomers with moderate cloud engineering knowledge, or (ideally) IT groups. Over the past 3 yr, it has been utilized to build science platforms for the DiRAC Institute, the ZTF partnership, the LSST Solar System Science Collaboration, and the LSST Interdisciplinary Network for Collaboration and Computing, as well as for numerous short-term events (with over 100 simultaneous users). In a live demo instance, the deployment scripts, source code, and cost calculators are accessible.⁴⁴http://hub.astronomycommons.org/
more » « less
Learning Spectral Templates for Photometric Redshift Estimation from Broadband Photometry

https://doi.org/10.3847/1538-3881/abb0e2

Crenshaw, John Franklin; Connolly, Andrew J. (October 2020, The Astronomical Journal)
null (Ed.)
Full Text Available
Sifting through the Static: Moving Object Detection in Difference Images

https://doi.org/10.3847/1538-3881/ac22ff

Smotherman, Hayden; Connolly, Andrew J.; Kalmbach, J. Bryce; Portillo, Stephen K.; Bektesevic, Dino; Eggl, Siegfried; Juric, Mario; Moeyens, Joachim; Whidden, Peter J. (November 2021, The Astronomical Journal)

Abstract Trans-Neptunian objects provide a window into the history of the solar system, but they can be challenging to observe due to their distance from the Sun and relatively low brightness. Here we report the detection of 75 moving objects that we could not link to any other known objects, the faintest of which has a VR magnitude of 25.02 ± 0.93 using the Kernel-Based Moving Object Detection (KBMOD) platform. We recover an additional 24 sources with previously known orbits. We place constraints on the barycentric distance, inclination, and longitude of ascending node of these objects. The unidentified objects have a median barycentric distance of 41.28 au, placing them in the outer solar system. The observed inclination and magnitude distribution of all detected objects is consistent with previously published KBO distributions. We describe extensions to KBMOD, including a robust percentile-based lightcurve filter, an in-line graphics-processing unit filter, new coadded stamp generation, and a convolutional neural network stamp filter, which allow KBMOD to take advantage of difference images. These enhancements mark a significant improvement in the readiness of KBMOD for deployment on future big data surveys such as LSST.
more » « less
Full Text Available
Toward Sampling for Deep Learning Model Diagnosis

https://doi.org/10.1109/ICDE48307.2020.00201

Mehta, Parmita; Portillo, Stephen; Balazinska, Magdalena; Connolly, Andrew (April 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE))

Deep learning (DL) models have achieved paradigm-changing performance in many fields with high dimensional data, such as images, audio, and text. However, the black-box nature of deep neural networks is not only a barrier to adoption in applications such as medical diagnosis, where interpretability is essential, but it also impedes diagnosis of under performing models. The task of diagnosing or explaining DL models requires the computation of additional artifacts, such as activation values and gradients. These artifacts are large in volume, and their computation, storage, and querying raise significant data management challenges. In this paper, we develop a novel data sampling technique that produces approximate but accurate results for these model debugging queries. Our sampling technique utilizes the lower dimension representation learned by the DL model and focuses on model decision boundaries for the data in this lower dimensional space.
more » « less
Full Text Available
The DECam Ecliptic Exploration Project (DEEP). III. Survey Characterization and Simulation Methods

https://doi.org/10.3847/1538-3881/ad1527

Bernardinelli, Pedro H.; Smotherman, Hayden; Langford, Zachary; Portillo, Stephen K. N.; Connolly, Andrew J.; Kalmbach, J. Bryce; Stetzler, Steven; Jurić, Mario; Oldroyd, William J.; Lin 林, Hsing Wen 省文; et al (February 2024, The Astronomical Journal)

Abstract We present a detailed study of the observational biases of the DECam Ecliptic Exploration Project’s B1 data release and survey simulation software that enables direct statistical comparisons between models and our data. We inject a synthetic population of objects into the images, and then subsequently recover them in the same processing as our real detections. This enables us to characterize the survey’s completeness as a function of apparent magnitudes and on-sky rates of motion. We study the statistically optimal functional form for the magnitude, and develop a methodology that can estimate the magnitude and rate efficiencies for all survey’s pointing groups simultaneously. We have determined that our peak completeness is on average 80% in each pointing group, and our magnitude drops to 25% of this value atm₂₅= 26.22. We describe the freely available survey simulation software and its methodology. We conclude by using it to infer that our effective search area for objects at 40 au is 14.8 deg², and that our lack of dynamically cold distant objects means that there at most 8 × 10³objects with 60 <a< 80 au and absolute magnitudesH≤ 8.
more » « less
Dimensionality Reduction of SDSS Spectra with Variational Autoencoders

https://doi.org/10.3847/1538-3881/ab9644

Portillo, Stephen K.; Parejko, John K.; Vergara, Jorge R.; Connolly, Andrew J. (July 2020, The Astronomical Journal)

Full Text Available

« Prev Next »

Search for: All records